Categorical Variables Recap and Some Ethics…

STAT 313

What are the two data types R stores categorical variables as?

dplyr – a tool bag for data wrangling

filter()

select()

mutate()

summarize()

arrange()

group_by()

The Pipe %>%

If you wanted means for each level of a categorical variable, what would you do?

Trout Size

The HJ Andrews Experimental Forest houses one of the larges long-term ecological research stations, specifically researching cutthroat trout and salamanders in clear cut or old growth sections of Mack Creek.


trout %>% 
  group_by(section) %>% 
  summarize(mean_length = mean(length_1_mm, na.rm = TRUE)
            )
# A tibble: 2 × 2
  section                               mean_length
  <chr>                                       <dbl>
1 clear cut forest                             85.3
2 upstream old growth coniferous forest        81.4


Why na.rm = TRUE?

Classifying Channel Types

The channels of the Mack Creek which were sampled were classified into the following groups:

"C"

"I"

"IP"

"P"

"R"

"S"

"SC"

NA

cascade

riffle

isolated pool

pool

rapid

step (small falls)

side channel

not sampled by unit

filter()-ing Specific Channel Types

The majority of the Cutthroat trout were captured in cascades (C), pools (P), and side channels (SC). Suppose we want to only retain these levels of the unittype variable.


trout %>% 
  filter(unittype %in% c("C", "P", "SC"))


%in%

If you filter includes more than one value you must use %in% not ==!

Categorical Variables for Whom?

Suppose Cal Poly is interested in summarizing the demographics of their undergraduate students. They have designed the following question asking about student’s gender identity:


What is your gender identity?

Male, Female, Other

Who benefits from these options?

Who suffers from these options?

Data Feminism

  • Data science by whom?

  • Data science for whom?

  • Data sets about whom?

  • Data science with whose values?

Rethink binaries



How would you redesign the survey question about student’s gender identity?

Challenge power

An aura objectivity

“We focus on four conventions which imbue visualizations with a sense of objectivity, transparency and facticity. These include: (a) two-dimensional viewpoints, (b) clean layouts, (c) geometric shapes and lines, (d) the inclusion of data sources.”

The work that visualization communications do

Elevate emotion

https://guns.periscopic.com/